System for increasing the speed of a sum-of-absolute-differences operation

ABSTRACT

An adaptation of the sum-of-absolute-differences (SAD) calculation is implemented by modifying existing circuitry in a microprocessor. The adaptation yields a reduction of over 30% for a current SAD calculation. The adaptation includes a first and second operand register, each storing respectively a first and second set of 2&#39;s complement binary data, an arithmetic logic unit (ALU), and a destination register. An add/subtract enable input on the ALU receives a most significant bit (MSB) of the second set of binary data. The ALU adds the first and second data sets if the MSB is a “0” and subtracts the second data set from the first data set if the MSB is a “1.” The add/subtract enable input has the effect of taking the absolute value of the second data set without having to first perform an absolute value determination, thus eliminating processing steps.

TECHNICAL FIELD

The present invention relates to arithmetic operations inmicroprocessors, both load-store architectures (i.e., RISC machines) andmemory-oriented architectures (i.e., CISC machines). Specifically, thepresent invention relates to arithmetic operations used inmotion-estimation algorithms and other applications using asum-of-differences (SAD) operation.

BACKGROUND ART

Video images have become an increasingly important part ofcommunications in general. An ability to nearly instantaneously transmitstill images, and particularly, live moving images, has greatly enhancedglobal communications.

In particular, videoconferencing systems have become an increasinglyimportant business communication tool. These systems facilitate meetingsbetween persons or groups of persons situated remotely from each other,thus eliminating or substantially reducing the need for expensive andtime-consuming business travel. Since videoconference participants areable to see facial expressions and gestures of remote participants,richer and more natural communication is engendered. In addition,videoconferencing allows sharing of visual information, such asphotographs, charts, and figures, and may be integrated with personalcomputer applications to produce sophisticated multimedia presentations.

To provide cost-effective video communication, bandwidth required toconvey video must be limited. A typical bandwidth used forvideoconferencing lies in the range of 128 to 1920 kilobits per second(Kbps). Problems associated with available videoconferencing systems inan attempt to cope with bandwidth limitations include slow frame ratesresulting in a non-lifelike picture having an erratic, jerky motion, theuse of small video frames or limited spatial resolution of a transmittedvideo frame, and a reduction in the signal-to-noise ratio of individualvideo frames. Conventionally, if one or more of these effects isundesirable, then higher bandwidths are required.

At 768 Kbps, digital videoconferencing, using state-of-the-art videoencoding methods, produces a picture that may be likened to a scene fromanalog television. Typically, for most viewers, twenty-four frames persecond (fps) are required to make video frames look fluid and give theimpression that motion is continuous. As the frame rate is reduced belowtwenty-four fps, an erratic motion results. In addition, there is alwaysa tradeoff between a video frame size required and available networkcapacity. Therefore, lower bandwidth requires a lower frame rate and/orreduced video frame size.

A standard video format used in videoconferencing, defined byresolution, is Common Intermediate Format (CIF). The primary CIF formatis also known as Full CIF or FCIF. The International TelecommunicationsUnion (ITU), based in Geneva, Switzerland (www.itu.ch), has establishedthis communications standard. Additional standards with resolutionshigher and lower than CIF have also been established. Resolution and bitrate requirements for various formats are shown in Table 1. Bit rates(in megabits per second, Mbps) shown are for uncompressed color frameswhere 12 bits per pixel is assumed.

Video compression is a way of encoding digital video to take up lessstorage space and reduce required transmission bandwidth. Certaincompression/decompression (CODEC) schemes are frequently used tocompress video frames to reduce the required transmission bit rates.Overall, CODEC hardware or software compresses digital video into asmaller binary format than required by the original (i.e., uncompressed)digital video format. As can be noted from Table 1, there is anextraordinarily large number of bits (e.g., nearly 584 million bits eachsecond in a 16CIF format), and consequently, a tremendous amount ofprocessing of the bits that must occur for effective video processingand motion estimation. Consequently, an ever-increasing application formicroprocessors is CODEC processing.

TABLE 1 Resolution and bit-rates for various CIF formats Resolution BitRate at 30 fps CIF Format (in pixels) (Mbps) SQCIF (Sub Quarter CIF) 128× 96  4.4 QCIF (Quarter CIF) 176 × 144 9.1 CIF (Full CIF, FCIF) 352 ×288 36.5 4CIF (4 × CIF) 704 × 576 146.0 16CIF (16 × CIF) 1408 × 1152583.9

Motion estimation algorithms are a significant part of the CODECprocessing. A sum-of-absolute-differences (SAD) operation is frequentlythe cornerstone of most motion estimation algorithms. Based on theamount of processing required for every frame in a video image, the SADoperation is extremely computationally intensive. Therefore, theoperation must be performed as quickly and efficiently as possible.

A governing equation of the SAD operation:

${{SAD}( {U,V} )} = {\sum\limits_{x = 0}^{15}{\sum\limits_{y = 0}^{15}{{{U( {x,y} )} - {V( {x,y} )}}}}}$where U and V are image frames and x and y are 2-dimensional spatialcoordinates. Despite its apparent innocuous and simple form, the SADgoverning equation in its current formulation, when coupled with thetremendously high number of bits requiring processing, is extremelycomputationally intensive and thus places a limit on a temporal speed ofmotion estimation algorithms. Adding additional hardware (e.g., multipleprocessors, additional memory, additional registers) increases a speedof the computation but at a sacrifice of geometrical spaceconsiderations and cost-to-implement.

Therefore, although the SAD operation and other compression methods haveproven somewhat effective, there remains a need to improve video qualityover low bandwidth transmission channels while not significantlyincreasing space for hardware performing the calculations or a cost ofthe implementing hardware.

SUMMARY

The present invention implements an adaptation of the SAD governingequation in microprocessors by modifying hardware already present in allmicroprocessors. The adaptation yields a reduction of over 30% for acurrent SAD calculation by reducing required calculations in a normallytime-intensive inner iteration loop.

In one exemplary embodiment, the present invention is a system forcalculating sum-of-absolute-differences which includes a first andsecond operand register, each storing respectively a first and secondset of binary data as 2's complement, an arithmetic logic unit, and adestination register.

The arithmetic logic unit has an add/subtract enable input to receive amost significant bit (MSB) of the second set of binary data. The MSBrepresents a sign of the second set of binary data. The arithmetic logicunit receives the first and second sets of binary data and adds the datasets if the MSB is a “0” and subtracts the second data set from thefirst data set if the MSB is a “1.” The add/subtract enable input hasthe effect of taking the absolute value of the second data set withouthaving to first perform an absolute value determination, thuseliminating processing steps.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an exemplary embodiment of the present inventionimplemented in a microprocessor with an arithmetic logic unit andoperand and destination registers.

FIG. 2 shows a specific exemplary embodiment of the present inventionimplemented in a microprocessor with an arithmetic logic unit andoperand and destination registers.

DETAILED DESCRIPTION

An embodiment of the present invention implements an adaptation of theSAD governing equation by modifying hardware already present in allmicroprocessors. The governing equation of the SAD operation

${{SAD}( {U,V} )} = {\sum\limits_{x = 0}^{15}{\sum\limits_{y = 0}^{15}{{{U( {x,y} )} - {V( {x,y} )}}}}}$is conventionally implemented (in pseudocode) as

for x = 0 to x = 15 (1)   for y = 0 to y = 15     temp = U(x, y) − V(x,y)     temp = abs(temp)     SAD = SAD + temp

To increase computational speed, SAD operations can be implementeddirectly as a result of an operation of summing data registers; thus,SAD(U,V) may be implemented asResult=Operand x+|Operand y|  (2)The operation defined by equation (2) and referred to as “Add AbsoluteValue” (ADDABS), will, when implemented by a suitable instruction,increase the SAD calculation speed compared with the conventionalimplementation of operation (1), supra. Equation (2) can be rewritten as

if (Operand y) < 0 (3)   Result  = Operand x + (− Operand y)      =Operand x − Operand y else   Result= Operand x + Operand y

For 2's complements numbers, the sign of a binary number is given by themost significant bit (MSB) of the number. Thus, the “if” statement ofthe pseudocode of operation of (3) can be written as

if (Operand y[MSB]) = = 1

Implementing the “if” statement (4) into operation (1) and applying theADDABS operation (2) yields

for x = 0 to x = 15 (5)   for y = 0 to y = 15     temp = U(x, y) − V(x,y)     ADDABS SAD, SAD, tempThis substitution yields a reduction from three operations currentlyused (operation (1)) down to two operations (operation (5)) in thenormally time-intensive inner iteration loop.

FIG. 1 implements a portion of the operation (5) in hardware andincludes an x-operand register 101, a y-operand register 103, anarithmetic logic unit (ALU) 105, and a destination register, d, 107. Aninput to the ALU 105, “Add/Sub Control,” provides a determinationwhether an addition or subtraction operation should be performed. A “1”input (i.e., the MSB indicates that the unary operator in 2's complementis negative) enables a subtraction operation. Conversely, a “0” input(i.e., the MSB indicates that the unary operator in 2's complement ispositive) enables an addition operation. Therefore, if the MSB ofoperand y is “1,” a resulting operation in the ALU 105 is x-y. If theMSB of operand y is “0,” a resulting operation in the ALU 105 is x+y.

In a specific exemplary embodiment, the ADDABS operation adds a locationof the x-operand register 101 to a location of the y-operand register103 and stores the result on the destination register 107. Thus, theoperation is reduced toRd←Rx+|Ry|  (6)where Rx is the contents of the x-operand register 101, Ry is thecontents of the y-operand register 103, and Rd are the data stored inthe destination register 107. For a 16-bit word, operands for eachregister in equation (6) are {x,y,d}ε{0,1, . . . ,15}. In this specificexemplary embodiment, syntax of operations for equation (6) is ADDABSRd, Rx, Ry.

In a specific exemplary embodiment of FIG. 2, a multiplexer 201 is addedto the general circuit configuration in accordance with FIG. 1. An input“addabs_ctrl” determines whether an ADDABS operation or a standardadd/subtract operation should be performed. A “1” input will force astandard add or subtract operation, while a “0” input forces an ADDABSoperation.

An input to the multiplexer 201, “addsub_ctrl,” determines whether anaddition or subtraction should be performed. This input becomes valid if“addsub_ctrl” is set to “1.” Providing a “1” input to “addsub_ctrl”forces a subtraction operation, while a “0” forces an additionoperation.

For example, if “addabs_ctrl” is set to “0,” the MSB of operand ydetermines whether an addition or subtraction is performed. If the MSBof operand y is “1,” x-y is performed. If the MSB of operand y is “0,”x+y is performed.

Although the exemplary embodiments are described in terms of a videoCODEC system, a skilled artisan will recognize the value of increasing acalculation speed of SAD operations in other areas as well. For example,calculation of spatial statistics frequently employsum-of-absolute-difference calculations; specifically, autocorrelationfunctions and power spectral density functions (e.g., as employed incommunication theory) can benefit from the increased computational speedof the present invention described herein.

Further, one skilled in the art will recognize that other hardwareimplementations may readily be employed as well. For example, otherhardware implementations could include ripple-carry adders orcarry-lookahead adders in place of the arithmetic logic unit. Suchcombinatorial circuits are well known in the art and may be readilycombined to accomplish the SAD operations described herein.Additionally, a skilled artisan will recognize that the presentinvention will work readily with other word sizes, such as, for example,32-bit words. Therefore, the specification and drawings are to beregarded in an illustrative rather than a restrictive sense.

1. A video conferencing motion estimation system, the system comprising:a microprocessor, the microprocessor being configured to operate as acompression and decompression (CODEC) device, the CODEC deviceincluding; a first operand register, the first operand registerconfigured to store a first set of binary data as 2's complement; asecond operand register, the second operand register configured to storea second set of binary data as 2's complement; an arithmetic logic unit,the arithmetic logic unit having a first input and a second input, thefirst input coupled to receive the first set of binary data and thesecond input coupled to receive the second set of binary data, thearithmetic logic unit further having an add/subtract enable input, theadd/subtract enable input coupled to receive a most significant bit ofthe second set of binary data, the most significant bit representing asign of the second set of binary data, the arithmetic logic unit beingconfigured to add data input from the first and second operand registersif the most significant bit is of a first value and subtract the secondset of binary data from the first set of binary data if the mostsignificant bit is of a second value; a destination register, thedestination register coupled to an output of the arithmetic logic unitto receive a result; and a multiplexer, the multiplexer having a firstinput and a second input, an output, and a select terminal, the outputbeing coupled to the add/subtract enable input of the arithmetic logicunit, the first input being coupled to receive the most significant bitof the second set of binary data, and the second input being coupled toan add/subtract control line.
 2. The system of claim 1 wherein the firstvalue is a binary “0” and the second value is a binary “1.”
 3. Thesystem of claim 1 wherein a first select value applied to the selectterminal enables the arithmetic logic unit to be able to perform anaddition or subtraction operation and a second select value applied tothe select terminal enables the arithmetic logic unit to perform anaddabs operation, the addabs operation being defined as adding anabsolute value of the second set of binary data to the first set ofbinary data.
 4. The system of claim 1 wherein a first value on theadd/subtract control line enables the arithmetic logic unit to performan addition operation and a second value on the add/subtract controlline enables the arithmetic logic unit to perform a subtractionoperation.
 5. A system for calculating sum-of-absolute-differences, thesystem comprising: a first operand register, the first operand registerconfigured to store a first set of binary data as 2's complement; asecond operand register, the second operand register configured to storea second set of binary data as 2's complement; an arithmetic logic unit,the arithmetic logic unit having a first input and a second input, thefirst input coupled to receive the first set of binary data and thesecond input coupled to receive the second set of binary data, thearithmetic logic unit further having an add/subtract enable input, theadd/subtract enable input coupled to receive a bit of the second set ofbinary data, the bit representing a sign of the second set of binarydata, the arithmetic logic unit being configured to add data input fromthe first and second operand registers if the bit is of a first valueand subtract the second set of binary data from the first set of binarydata if the most significant bit is of a second value; a destinationregister, the destination register coupled to an output of thearithmetic logic unit to receive a result; and a multiplexer, themultiplexer having a first input and a second input, an output, and aselect terminal, the output being coupled to the add/subtract enableinput of the arithmetic logic unit, the first input being coupled toreceive the bit of the second set of binary data, and the second inputbeing coupled to an add/subtract control line.
 6. The system of claim 5wherein the first value is a binary “0” and the second value is a binary“1.”
 7. The system of claim 5 wherein a first select value applied tothe select terminal enables the arithmetic logic unit to be able toperform an addition or subtraction operation and a second select valueapplied to the select terminal enables the arithmetic logic unit toperform an addabs operation, the addabs operation being defined asadding an absolute value of the second set of binary data to the firstset of binary data.
 8. The system of claim 5 wherein a first value onthe add/subtract control line enables the arithmetic logic unit toperform an addition operation and a second value on the add/subtractcontrol line enables the arithmetic logic unit to perform a subtractionoperation.
 9. A system for calculating sum-of-absolute-differences, thesystem comprising: a first operand register, the first operand registerconfigured to store a first set of binary data as 2's complement; asecond operand register, the second operand register configured to storea second set of binary data as 2's complement; an arithmetic logic unit,the arithmetic logic unit having a first input and a second input, thefirst input coupled to receive the first set of binary data and thesecond input coupled to receive the second set of binary data, thearithmetic logic unit further having an add/subtract enable input, theadd/subtract enable input coupled to receive a most significant bit ofthe second set of binary data, the most significant bit representing asign of the second set of binary data, the arithmetic logic unit beingconfigured to add data input from the first and second operand registersif the most significant bit is of a first value and subtract the secondset of binary data from the first set of binary data if the mostsignificant bit is of a second value; a destination register, thedestination register coupled to an output of the arithmetic logic unitto receive a result; and a multiplexer, the multiplexer having a firstinput and a second input, an output, and a select terminal, the outputbeing coupled to the add/subtract enable input, the first input beingcoupled to receive the most significant bit of the second set of binarydata, the second input being coupled to an add/subtract control line.10. The system of claim 9 wherein a first select value applied to theselect terminal enables the arithmetic logic unit to be able to performan addition or subtraction operation and a second select value appliedto the select terminal enables the arithmetic logic unit to perform anaddabs operation, the addabs operation being defined as adding anabsolute value of the second set of binary data to the first set ofbinary data.
 11. The system of claim 9 wherein a first value on theadd/subtract control line enables the arithmetic logic unit to performan addition operation and a second value on the add/subtract controlline enables the arithmetic logic unit to perform a subtractionoperation.
 12. A system for calculating sum-of-absolute-differences, thesystem comprising: a first operand register, the first operand registerconfigured to store a first set of binary data as 2's complement; asecond operand register, the second operand register configured to storea second set of binary data as 2's complement; an arithmetic calculationmeans for accepting a most significant bit of the second data set andadding the first and second binary data sets directly if the mostsignificant bit is a first value and subtracting the second set ofbinary data from the first set of binary data if the most significantbit is of a second value; a destination register, the destinationregister coupled to an output of the arithmetic logic unit to receive aresult; and a multiplexer, the multiplexer having a first input and asecond input, an output, and a select terminal, the output being coupledto arithmetic calculation means, the first input being coupled toreceive the most significant bit of the second set of binary data, thesecond input being coupled to an add/subtract control line.
 13. Thesystem of claim 12 wherein a first select value applied to the selectterminal enables the arithmetic calculation means to be able to performan addition or subtraction operation and a second select value appliedto the select terminal enables the arithmetic calculation means toperform an addabs operation, the addabs operation being defined asadding an absolute value of the second set of binary data to the firstset of binary data.
 14. The system of claim 12 wherein a first value onthe add/subtract control line enables the arithmetic calculation meansto perform an addition operation and a second value on the add/subtractcontrol line enables the arithmetic calculation means to perform asubtraction operation.
 15. The system of claim 12, wherein thearithmetic calculation means is an arithmetic logic unit.
 16. The systemof claim 12, wherein the arithmetic calculation means is a ripple-carryadder.
 17. The system of claim 12, wherein the arithmetic calculationmeans is a carry lookahead adder.